op {
  graph_op_name: "AllCandidateSampler"
  in_arg {
    name: "true_classes"
    description: <<END
A batch_size * num_true matrix, in which each row contains the
IDs of the num_true target_classes in the corresponding original label.
END
  }
  out_arg {
    name: "sampled_candidates"
    description: <<END
A vector of length num_sampled, in which each element is
the ID of a sampled candidate.
END
  }
  out_arg {
    name: "true_expected_count"
    description: <<END
A batch_size * num_true matrix, representing
the number of times each candidate is expected to occur in a batch
of sampled candidates. If unique=true, then this is a probability.
END
  }
  out_arg {
    name: "sampled_expected_count"
    description: <<END
A vector of length num_sampled, for each sampled
candidate representing the number of times the candidate is expected
to occur in a batch of sampled candidates.  If unique=true, then this is a
probability.
END
  }
  attr {
    name: "num_true"
    description: <<END
Number of true labels per context.
END
  }
  attr {
    name: "num_sampled"
    description: <<END
Number of candidates to produce.
END
  }
  attr {
    name: "unique"
    description: <<END
If unique is true, we sample with rejection, so that all sampled
candidates in a batch are unique. This requires some approximation to
estimate the post-rejection sampling probabilities.
END
  }
  attr {
    name: "seed"
    description: <<END
If either seed or seed2 are set to be non-zero, the random number
generator is seeded by the given seed.  Otherwise, it is seeded by a
random seed.
END
  }
  attr {
    name: "seed2"
    description: <<END
An second seed to avoid seed collision.
END
  }
  summary: "Generates labels for candidate sampling with a learned unigram distribution."
  description: <<END
See explanations of candidate sampling and the data formats at
go/candidate-sampling.

For each batch, this op picks a single set of sampled candidate labels.

The advantages of sampling candidates per-batch are simplicity and the
possibility of efficient dense matrix multiplication. The disadvantage is that
the sampled candidates must be chosen independently of the context and of the
true labels.
END
}
